Evaluating Matching Algorithms: the Monotonicity Principle Position Statement

نویسنده

  • Avigdor Gal
چکیده

Traditionally, semantic reconciliation was performed by a human observer (a designer or a DBA) [8] due to its complexity [3]. However, manual reconciliation (with or without computer-aided tools) tends to be slow and inefficient in dynamic environments and does not scale for obvious reasons. Therefore, the introduction of the semantic Web vision and the shift towards machine understandable Web resources has unearthed the importance of automatic semantic reconciliation. Consequently, new tools for automating the process, such as GLUE [4], and OntoBuilder [11], were introduced. Generally speaking, the process of semantic reconciliation is performed in two steps. First, given two attribute sets A and A′ (denoted schemata) with n1 and n2 attributes, respectively,1 a degree of similarity is computed automatically for all attribute pairs (one attribute from each schema), 2 using such methods as name matching, domain matching, structure (such as XML hierarchical representation) matching, and Machine Learning techniques. As a second step, a single mapping from A to A ′ is chosen to be the best mapping. Typically, the best mapping is the one that maximizes the sum (or average) of pair-wise weights of the selected attributes. We differentiate the best mapping from the exact mapping, which is the output of a matching process as would be performed by a human observer. Automatic matching may carry with it a degree of uncertainty since “the syntactic representation of schemas and data do not completely convey the semantics of different databases” [10]. As an example, consider name matching, a common method in tools such as OntoBuilder [6], Protégé [5], and Ariadne [9]. With name matching, one assumes that similar attributes have similar (or even identical) names. However, the occurrence of synonyms (e.g., remuneration and salary) and homonyms (e.g., age referring to either human age or wine age) may trap this method into erroneous mapping. As a consequence, there is no guarantee that the exact mapping is always the best mapping. We present the monotonicity principle, a sufficient condition to ensure that exact mapping would be ranked sufficiently close to the best mapping. Roughly speaking, the monotonicity principle proclaims that by replacing a mapping with a better one, score wise, one gets a more accurate mapping (from a human observer point of view), even if by doing so, some of the attribute mappings are of less quality. We have demonstrated, through theoretical [7] and empirical analysis,[2] that for monotonic mappings that satisfy the monotonicity principle, one can safely interpret a high similarity measure as an indication that more attributes are mapped correctly. An immediate consequence of this result is the establishment of a corroboration for the quality of mapping algorithms, based on their capability to generate monotonic mappings. We have experimented with a matching algorithm and report on our experiences in [2]. Our findings indicate that matching algorithms that generate monotonic mappings are well-suited for automatic semantic reconciliation. Another outcome of the monotonicity principle is that a good automatic semantic reconciliation algorithm would rank the exact mapping relatively close to the best mapping, thus enabling an efficient search of the exact mapping [1]. Monotonicity is not defined in “operational” terms, since it is compared to an initially unknown exact mapping. In fact, such an operational definition may not be generally developed, since algorithms may perform well only on some schema pairs. Therefore, a task for future research involves possible classification of application types on which

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluating Matching Algorithms: the Monotonicity Principle

In this paper we present the monotonicity principle, a sufficient condition to ensure that exact mapping, a mapping as would be performed by a human observer, is ranked close to the best mapping, as generated automatically by a matching algorithm. The research is motivated by the introduction of the semantic Web vision and the shift towards machine understandable Web resources. We support the i...

متن کامل

Belief-Revision, the Ramsey Test, monotonicity, and the so-Called Impossibility Results

Peter Gärdenfors proved a theorem purporting to show that it is impossible to adjoin to the AGM-postulates for belief-revision a principle of monotonicity for revisions. The principle of monotonicity in question is implied by the Ramsey test for conditionals. So Gärdenfors’ result has been interpreted as demonstrating that it is impossible to combine the Ramsey test for conditionals with the ba...

متن کامل

A02p: Ad Hoc On-demand Position Based Private Routing Protocol

Privacy is needed in ad hoc networks. An ad hoc on-demand position-based private routing algorithm, called A02P, is proposed for communication anonymity. Only the position of the destination is exposed in the network for route discovery. To discover routes with the limited routing information, a receiver contention scheme is designed for determining the next hop. pseudo identifiers are used for...

متن کامل

Performance Evaluation of Local Detectors in the Presence of Noise for Multi-Sensor Remote Sensing Image Matching

Automatic, efficient, accurate, and stable image matching is one of the most critical issues in remote sensing, photogrammetry, and machine vision. In recent decades, various algorithms have been proposed based on the feature-based framework, which concentrates on detecting and describing local features. Understanding the characteristics of different matching algorithms in various applications ...

متن کامل

Optimal matching problem in detection and recognition performance evaluation

This paper proposes a principle of one-to-one correspondence in performance evaluation of a general class of detection and recognition algorithms. Such a correspondence between ground-truth entities and algorithm declared entities is essential in accurately computing objective performance measures such as the detection, recognition, and false alarm rates. We mathematically de ne the corresponde...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003